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Abstract 

The Resource Description Framework (rdf) represents information as subject-predicate-object 
triples. These triples are commonly interpreted as a directed labelled graph. We propose an alter¬ 
native approach, interpreting the data as a 3-way Boolean tensor. We show how SPARQL queries - 
the standard queries for rdf - can be expressed as elementary operations in Boolean algebra, giving 
us a complete re-interpretation of RDF and SPARQL. We show how the Boolean tensor interpreta¬ 
tion allows for new optimizations and analyses of the complexity of SPARQL queries. For example, 
estimating the size of the results for different join queries becomes much simpler. 

1 Introduction 

The Resource Description Framework (rdf(0] is a W3C standard for representing information in the web. 
rdf data consist of subject-predicate-object (s,p,o) triples that are commonly treated as a directed 
labelled graph. Edges in the rdf graph go from subjects to objects and predicates form the edge 
labels. The rdf data can be queried with the SPARQL query language^. The labelled directed graph 
interpretation of rdf data allows for defining some SPARQL operations in an intuitive way, but it is not 
always the most convenient theoretical framework to work with: for example, the ( s,p,o ) triples treat 
the predicate in no different way to the subject or object, yet in the graph interpretation, the predicate 
acts as the edge, while the subject and object are nodes. 

In this paper we propose to approach the RDF data as a 3-way Boolean tensor, defining the SPARQL 
operations using elementary tensor and matrix operations. While seeing RDF data as tensor is nothing 
new, to the best of our knowledge we present the first comprehensive analysis of SPARQL in terms of 
Boolean tensor operations (although there are prior work on efficiently implementing specific SPARQL 
operations using binary tensors, see Section [7]). 

We want to emphasize that we do not propose that efficient rdf databases should be build on 
top of the tensor interpretation. We do think, however, that the tensor interpretation makes certain 
optimization techniques more intuitive than the graph interpretation. This will hopefully yield concrete 
benefits for the query processing, for example, by providing more efficient cardinality estimators for joins. 

After a brief introduction to tensor notation and terminology (Section [2| , we will explain the data 
model and how the SPARQL queries can be evaluated directly using tensor algebra (Sections [3] and 0]). For 
the sake of clarity, we present the correspondence between SPARQL and tensor algebra using examples. 
We will then start studying the results we can get using our tensor formulation. First we will study 
the computation of joins and the estimation of their cardinalities (Section [5]), before moving to tensor 
decompositions (Section [G]) and their properties. Throughout these two sections, we list a number of 
propositions. Many of them are either straight forward, or have been proved earlier. Our goal in 
presenting these results is to show what kind of results our framework facilitates. We will cover some 
related work, interesting future directions, and conclusions in the last three sections. 

1 http: //ww. w3. org/TR/rdf-syntax-grammar 
"http: //ww. w3. org/TR/rdf-sparql-query 
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2 Notation 


Throughout this paper, we indicate vectors as bold type lower-case letters (v), matrices as bold upper¬ 
case letters (M), and tensors as bold upper-case calligraphic letters (7~). 

Element (i, j, fc) of a 3-way tensor X is denoted as Xijk■ A colon in a subscript denotes taking that 
mode entirely; for example, X ::k is the fcth frontal slice of X (X k in short), and x : j k is a tube along the 
first way of X . 

For a 3-way tensor X , x : jk is the mode-1 (row) fibre , Xi :k is the mode-2 (column) fibre , and Xij- is the 
mode-3 (tube) fibre. Furthermore, X ::k is the fcth frontal slice of X . 

A tensor can be unfolded into a matrix by arranging its fibres as columns of a matrix. The mode-z 
matricization, of n-by-m-by-Z tensor T, denoted T ( „), takes the rnode-z fibres of T and arranges them as 
the columns of matrix T yy For example, in mode-1 unfolding, the columns of T” constitute the columns 
of T (x) that has n rows and ml columns. 

The outer product of vectors in N modes is denoted by E3. That is, if a, b, and c are vectors of 
length to, n, and Z, respectively, A' = a§6Bcisan TO-by-n-by-Z tensor with x^k = aibjCk- 

The Boolean tensor sum of binary tensors X and y is defined as {X V y)ijk = Xijk V yij k - For binary 
matrices X and Y where X has r columns and Y has r rows their Boolean matrix product , X o Y, is 
defined as (X o Y)ij = \J r k=1 x ik ykj- 

Let X be an ni-by-mi matrix and Y be an n 2 -by-m 2 matrix. Their Kronecker (matrix) product is 
the niri 2 -by-TOim 2 matrix X ® Y defined by 


/ XnY 

XI 2 Y ■ 

*^lmi Y N 

X21 Y 

x 22 Y ■ 

^2mi Y 

^ni 1^ 

Xn l2 Y 

•Enimi Y J 


( 1 ) 


The Khatri-Rao (matrix) product of X and Y is defined as “column-wise Kronecker”. That is, X 
and Y must have the same number of columns (toi = m 2 = to), and their Khatri-Rao product LoV 
is the ni7i2-by-TO matrix defined as 


X 0 Y = (x 1 <g> x 2 <8> y 2 , ■ ■ ■, x m <g> y m ) 

I x 11 y 1 ■ ■ ■ x lm y m \ 

3?2l2/l ’ * * %2myrn 

X^nilVl ' ‘ ‘ ^nim2/m/ 

Notice that if X and Y are binary, so are X <g> Y and X 0 Y. 


( 2 ) 


3 Data Model 

Given an rdf graph T, let S(T ), P(T ), and 0{T) denote the sets of distinct subjects, predicates and 
objects respectively. The number of distinct subjects is denoted by |S(T)|, or shorthand |Sj. Corre¬ 
spondingly |P(T)| = |P| and |0(T)| = \0\ denote the number of predicates and objects. The shorthand 
notation we use in unambiguous cases, for instance a maximal index occurring in a subscript, and if the 
cardinality is common to all RDF graphs under discussion. 

To represent RDF data by a binary tensor, we enumerate all subjects, predicates, and objects to 
obtain mappings from the items in S(T ), P(T), and 0(T) to indices. Let Si denote the zth subject with 
the corresponding index i = 1,..., |Sj and respectively pj the jth predicate with j = 1,..., |P| and o k 
the fcth object with fc = 1,..., \0\. 

With this mapping we can represent any RDF graph T as a 3- way binary 151 -by- |P| -by- \0\ tensor 
T. An element ( i,j , fc) of P is 1 if and only if the respective subject-predicate-object triple ( Si,pj , o k ) is 
present in the rdf graph T. 
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4 SPARQL Queries 

Simple SPARQL queries consist of two parts, a SELECT clause and a WHERE clause. The SELECT clause 
identifies the variables to appear in the query result. The WHERE clause provides the basic graph pattern 
to match against the RDF data. A basic graph pattern is a set of triple patterns that matches a subgraph 
of the RDF data. Triple patterns are subject-predicate-object triples with the option that variables can 
be placed instead of each specific subject, predicate, or object. 

An example for a simple SPARQL query that has a single triple pattern as basic graph pattern is 
SELECT * WHERE {?a T:pj T:ofc}. The keyword SELECT acts as a projection operator. It identifies the 
variables to appear in the query result. In this case, we ask for all variables that have been defined. 
The triple pattern in the WHERE part of the query has a variable for the subject, ?a, indicated by the 
prepended ? and a fixed predicate and object from RDF graph T, p j and oIt matches all RDF triples 
of T that have predicate p ? and object o^,. 

If the RDF data is represented as a binary 3-way tensor T", the triple pattern selects the fibre t : jk- 
That is a vector of all subjects with predicate j and object k. This vector has a 1 at positions i that 
correspond to an rdf triple (sj,pj, Ok) present in the RDF graph T . 

A slice of 7” would be selected if only one mode was fixed by the query. A query SELECT * WHERE 
{?a T :p j ?b} for instance resembles T-j : . If we change the projection to SELECT ?a we would get the 
same number of results but only the indices of the non-zero positions in the slice would appear in the 
final solution sequence and the indices remain hidden. 

4.1 Basic Graph Patterns — Join 

The basic graph pattern where a set of triple patterns must match can be understood as a join operation. 
Consider for example a basic graph pattern consisting of two triple patterns, {?a T:p; ?b} . {?c LApj 
?b}. This queries for all rdf triples where the object ?b is linked to a subject by predicate p t of RDF 
graph T as well as by predicate p j of rdf graph U, where i € {1,..., |P(T)|} and j £ {1,..., |P(C/)|}. 

Note that the braces in this example are only used to increase readability and could be omitted. For 
more complex queries however, curly braces define group graph pattern and hence the processing order. 

In Boolean tensor algebra, the triple patterns from the example above resemble the slices T : ; : and 
U-j : of RDF tensors T and U. A join operation on equal objects is equivalent to the Khatri-Rao product 
of T-i- and U-j : . However, in order to compute the Khatri-Rao product, the length of the columns of 
both slices must match and to obtain a meaningful result, the labels must be in the same order. 

Therefore, in case the objects of 7” and U do not map one-to-one, all-zero slices associated with the 
object labels from T that have no correspondent object label in U are appended to U and vice versa, 
such that |0(T)| = \0(U)\. Furthermore the labels of the objects in 7~ and U need to be in a common 
order. 

The result to the basic graph pattern is a matrix of size |S'(X 1 )| |5({7)|-by- \0\, 

T,i, © U : j: = (til © U,j2, 

( tuiU-.jl U2'U:j2 
t2i2U : j2 

\t\S\il u :jl t\S\i2 u :j2 

This matrix has non-zeroes where objects have corresponding subjects when the predicate is p, as well 
as p j. It can be regarded as |5(T)| blocks of |S(i7)| -by- \0\ matrices stacked on top of each other. Each 
block then corresponds to a subject ?a from T and each row per block corresponds to a subject ?c from 
U. This view enables a straight forward conversion from the Khatri-Rao product to the rdf triples: 
Instead of numbering the row indices from 1 to |S(T)| |S(t/)|, we refer to each row by its block index 
and the row index within the block. These indices link to both subjects and the corresponding object is 
encoded by the column index. 

There are more options to join triple patterns. Instead of binding the objects, we could bind on the 
subjects as in {?a T:p,; ?b} . {?a U:pj ?c}, or on both as in {?a T:p^ ?b} . {?a U :p j ?b}, or 
even on none as in {?a T:pi ?b} . {?c U: pj ?d}, or the subject of one pattern with the object of the 
other as in {?a T :p; ?b} . {?c U :p, ?a}, and so on. Also, it is possible to fix not the predicates but 


■ 4:i|0| ©G|0|) 


m \ 


hi\o\ u -.j\o\ 


t\S\i\0\U:j\0\J 


(3) 
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the subjects or objects, or even both of them or none. In the remainder of this section, we examine these 
options is detail and see how their Boolean tensor algebra counterparts look like. For this we assume 
the preprocessing step of matching the labels in the modes to be joined to be already done. 


4.1.1 One Each Fixed, One Bound 


The first option for joining two triple patterns we examine is where one variable in each pattern is fixed, 
like the predicate in the example above, and one is bound. The bound variable is indicated by a common 
name in both triple patterns of the query. To be part of the result rdf triples from both triple patterns 
must agree on the value of the bound variable. This means, we obtain all triple patterns that have a 
common label in the dimension of the bound variable. 

Suppose we fix the predicates Tip, and U:pj in both the triple patterns, where i £ {1,..., |P(T)|} 
and j £ {1,, |P(/7)|}. This leaves us with four options how to treat the subject and the object: 


1. {?a T:pi ?b} 

2. {?a T : Pi ?b} 

3. {?a T:p ( ?b} 

4. {?a T :pi ?b} 


{?a U :p j ?c} 
{?c U :p j ?b} 
{?c U :p j ?a} 
{?b U :p j ?c} 


As well as the predicate, we can fix either the subjects or the objects or a combination of them. This 
gives us nine more options for each of the above combinations. So, in total there are 36 different ways to 
perform a join with one variable bound and one in each pattern fixed. When calculating the Khatri-Rao 
product, the cases we need to distinguish however are less. Fixing a variable corresponds to selecting a 
slice of the rdf tensor. A slice is a matrix and hence we only care whether the join should be performed 
on the rows or columns of either matrix. The case where the columns of both matrices are joined, we have 
already seen in the example above. The case of joining the rows of both matrices can be accomplished 
by using the transpose of the matrices, and transposing the Khatri-Rao product. Thus, a join on the 
subjects while the predicates are fixed amounts to 


(;7' : . : ) r G iU :J: ) T ) = (t U: ®u lj: ,t 2 i: ®U 2 j:,...,t\o\i : <8> -K.| 0 |j:) J 


/ tiiiUlj-. t 2 nU 2 j : 

tli2Ulj : t 2 i 2 U 2 j : 


t\0\il U \0\j: \ 
t\0\i2U\o\j: 


(4) 


hi\S\ U 2j: "■ t\0\i\S\ u \0\j-. / 


Analogously, to perform a join between the columns of the first matrix and the rows of the second, 
we compute (T,j : ) 0 ( U : j-.) T and obtain an |5(T)| |0(I7)| -by-15(17)1 matrix where we interpret the rows 
as |5(T)| blocks of \0(U)\ objects. Note that in order compute the Khatri Rao product, the number of 
columns (i.e. the dimensions on which we join) must match and hence we require |5(£/)| = |0(T)| in 
this case. 

Likewise, to join between the rows of the first and the columns of the second, we need to evaluate 
( T; i: ) T © U-j : . This yields an \0(T)\ |5(t/)| -by- |S(T)| matrix where we interpret the rows as \0(T)\ 
blocks of |S(?7)| subjects. 


4.1.2 Two Each Fixed, One Bound 

As soon as two variables in a triple pattern are fixed, we select a vector from the corresponding RDF tensor 
T. The triples described by the pattern {s^ T :p j ?a} for example amount to the non-zero entries in 
tij : . A query like {T : Sj T: pj ?a} . {U:s k U:pi ?a}, where * £ { 1 ,..., |5(T)|}, k £ { 1 ,..., |5(f7)|}, 
j £ {1,..., |0(T)|}, and l £ { 1 , ..., |0([/)|}, returns those triples where there is a 1 in common positions 
in both vectors, i.e. tij- A Ukh- 

If we interpret the vectors t,j : and u k i : as two \0\ -by-1 matrices, we can use the Khatri-Rao product 
tij-. © Uki : to express the same operation. Analogously, we can fix any two variables from each triple 
pattern and bind the remaining one. Furthermore, we can easily join also a triple pattern with two fixed 
variables with a triple pattern with one bound variable using the Khatri-Rao product. The query {T: s 2 ; 
T:pj ?a} . {?b 17:p; ?a} for instance would yield a 1 • |5(T)|-by-|0| result. 
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4.1.3 One Fixed, One Bound 


It is also an option to fix only one variable in one of the triple patterns. Now we examine the Boolean 
tensor algebra version of a query like {?a T:pj ?c} . {?d ?b ?c} with j G {1, ..., |P(T)|}. This 
query asks for all triples with predicate T : p j that have an object common to any other triple in the 
RDF data. Thus, in case of an RDF tensor representation, this amounts to a column-wise matrix-tensor 
multiplication. This type of multiplication can be achieved by computing the Khatri-Rao product 
between the matrix T : j. representing the left triple pattern and each slice U-.k-. of U along the predicates 
k = 1,..., \P(U)\ together representing the right triple pattern. The outcome then is a 3-way tensor 
with |P([/)| slices, each of size |S(T)j -by-10|. 

Using the matricization of U along the mode to be joined, in this case the object and hence the third 
mode U ( 3 ), we can express the calculation described above in a concise form by 

T-j. © [7(3) = (t-.ji <g> U( 3 ) :1 , t-j2 ® U( 3):2, • . • ,t-.j\0\ ® M (3):|0|) 

^ UjlM(3):l Uj2«(3):2 
%1«(3):1 ^2i2M(3):2 

V|S|jl“(3):l *|S|j2«(3):2 

where t/( 3 ) is a matrix of size \P(U)\ |5(f7)| -by- \0\. The result is thus of size |£(T)| |P([/)| |5(U)| -by- |0| 
and in order to translate back to an rdf graph, the first dimension is regarded as |5(T)| blocks each with 
\P(U)\ blocks of |S(l/)| items. This view enables to address every field in the result matrix with a tuple 
(s i Pj s fc o 0, where i G {1,..., |S(T)|}, j G {1,..., \P(U)\}, k G {1, ..., |5(f7)|}, and l G {1,...,|0|}. 
The positions of the non-zeroes addressed in this way answer the rdf query. 

Of course, if a similar query is posed with the subjects bound instead of the objects, we need to apply 
the transpose before computing the Khatri-Rao product, and as well transpose the outcome. (This is 
similar to what has been discussed in case of one variable fixed in each triple.) 

4.1.4 Some Fixed, None Bound 

We now examine joins between triple patterns where none of the variables is bound. Technically, these 
are not any more joins. The simplest such case that is worth looking at is where in each of the triple 
patterns two variables are fixed. There is also the case of all three variables fixed, but this is no more 
than to compare whether two RDF triples are equal. 

A query with two variables fixed in each triple pattern for example is {T : s,; T :p j ?a} . {U :s^- 
U :p i ?b} , where i G (1,..., |S(T)|}, j G {1,..., |P(T)|}. k G {1,..., |5(I/)|}, and l G {1,..., \P(U)\}. 
The expected result from that query is a list of all combinations of the triples from the left pattern and 
the triples from the right pattern. As triples in each of the patterns differ only in their objects, this 
means we are looking for all possible combinations of objects matching the left triple pattern and objects 
matching the right triple pattern. These we obtain by taking the outer product of Uj- and Uki-., that is 
Uy ('Rfcc) 

Similarly, for a join query with one variable fixed in each triple pattern such as {?a T :p j ?b} . {?c 
U :p i ?d} the result is all combinations of subject-object pairs from the left triple pattern with subject- 
object pairs from the right triple pattern. In terms of Boolean algebra such operations can be expressed 
using the Kronecker product, a generalization of the outer product, 



( t-ijiU:l : 

tlj2U± 

■■ t 1 j\o\U±') 

T ® U, h - 

t2jlU± 

t2j2U± 

■■ t 2j \ 0 \U± 


\t\S\jlU-.l: 

t\S\j2U± ■ 

•' t\S\i\0\U:l:J 


An even more general notion of the outer product is needed to express a join between two triple 
patterns when no variable is fixed or bound, as in (?a ?b ?c} . (?d ?e ?f}. This can be accomplished 

multiplying two tensors, T <8> U. as discussed in [5]. This operation, although it is possible appears to 
have little practical relevance. 


*lj|0|«(3):|0| \ 
t2i\0\ u {3):\0\ 

t\S\j\0\U(3):\0\J 
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4.1.5 None Fixed, Some Bound 


The other extreme scenario is to have no fixed variables but varying amounts of bound variables in the 
triple patterns to join. The last case discussed in the previous section already is an example of such a 
join. The other end however is much simpler to start with: Consider a join query where all variables are 
bound. That would be {?a ?b ?c} . {?a ?b ?c}, which is asking for all RDF triples that occur in the 
rdf graph T and that match all rdf triples that occur in the RDF graph U. So, this query just yields 
all triples that occur in both T and U. In Boolean tensor algebra, this amounts to the position-wise OR 
between the corresponding tensors 7~ and U of common dimension |Sj -by- |P| -by-10| (possibly achieved 
by a preprocessing step), 

\S\ |P| IOI 

vvv tijk ® Uijfc , (7) 

2=1 j=l k=1 

where i = 1,..., |S|, j = 1,..., |P|, and k = 1,..., \0\. 

Similarly, if two variables are bound as in (?a ?b ?c} . {?a ?e ?c}), we receive only those triples 
where the subjects and objects match each other. This means, we take the outer product of each tube 
ti-.k from T with the corresponding tube Ui :k from U , 

\s\ IOI 

V V ti-.k ® U i:k , (8) 

i=l k-1 

where i = 1,..., |Sj, k = 1,..., |0|. 

In case one variable bound as in (?a ?b ?c} . (?a ?e ?f} we receive only those triples where the 
subjects match each other. This means that, we take the Kronecker product of each slice T r: from 'T 
with the corresponding slice TJi-- from U. For i = 1,..., 151, we get 

\S\ 

V Tr, <8 U i:: , (9) 

a matrix is of size |S(T)| |S(£/)| -by- \0(T)\ |0(P)|. 


4.2 Optional Graph Patterns — Left Outer Join 


Apart from the basic graph patterns, there are various other ways to combine triple patterns: group graph 
patterns, optional graph patterns, alternative graph patterns, and patterns on named graphs. Group 
graph patterns define the evaluation hierarchy which directly translates to associativity of Boolean tensor 
operations. Alternative graph patterns refer to the union of triple patterns which easily translates to 
ORing the respective tensor slices. Patterns on named graphs refer to the possibility to join triple patterns 
from different rdf graphs. 

The combination of triple patterns discussed now are the optional graph patterns. These patterns 
resemble left outer joins. This means, all triples from the left triple pattern have to appear in the final 
result. If they do not match any triple from the right pattern, they are listed paired with a blank item. 
Hence, we use all the triples that match, like in the normal join, and some more. To accomplish a left 
outer join operation using Boolean operations, we can thus use the join operation discussed above, but 
additionally we need to handle blank items. 

Consider the case where we join two triple patterns with fixed predicates and bound subjects (Equa¬ 
tion a. In the Khatri-Rao product, we get a 1 at positions where the left and right triples match with 
their subjects. Additionally now we need to cover triples with subjects that occur in the left triple 
pattern but not in the right. To do that we append a column to U : j : , the slice corresponding to the right 
triple pattern. That column has a 1 in rows where U :j: is all-zero and T vl . (the slice corresponding to 
the left triple pattern) has at least one non-zero. To evaluate {?a T: p, ?b} OPTIONAL {?a U :p j ?c}, 
we first append the extra column to and obtain 


U'-.j: = [U :j: ,k] 


\0(U)\ \0(U)\ 

\J 'U'-ir A ~ 1 \J 'U'-.js 

r—1 s—1 


( 10 ) 


6 




Then, like for the join on subjects, we compute the transposed Khatri-Rao product of the transposes of 
T :i - and U' : j :: and receive an |Sj -by- \0(T)\ (\0(U)\ + 1) matrix as the result, 

((r :i: ) T © (U' :j: ) T ) T = (t li: ■ ■,t\ 0 \i : © u\\o\ + l)j-.) T ■ (H) 

The additional column we introduce actually stands for “no value” and should be treated accordingly 
in any follow up calculation on the obtained result. 

4.3 Unique Results — Select Distinct 

Until now, we discussed different WHERE clauses. This section focuses on the keyword DISTINCT, a modifier 
of the SELECT clause. This keyword states that the solutions in the result sequence of the query must be 
unique, hence no duplicate solutions can occur. The naive way to approach this type of query is to first 
compute the solution from the WHERE clause and then purge the duplicates. Of course, this can also be 
accomplished similarly treating the rdf data as a binary tensor: Compute the result for the WHERE clause 
using the operations stated above, then OR along the dimensions not asked for in the SELECT clause. But 
the binary tensor representation offers also a more straight forward approach for such queries: For a join 
query accompanied by DISTINCT, instead of computing the Khatri-Rao product and then OR, the result 
can be obtained immediately. 

The first case we examine is a join query where SELECT DISTINCT chooses the bound variable, such as 
SELECT DISTINCT ?b WHERE {?a T:p, ?b} . {?c U: pj ?b}. Suppose we use the Khatri-Rao prod¬ 
uct to calculate the result of the WHERE clause. Then, the distinct objects ?b are those corresponding to 
the non-zero columns, 


\S(T)\ 

V (T :i: © U■ (12) 

k= 1 

In fact, the objects that constitute the result are those which occur at least once in both T : ,; : and U-j-. 
Hence, this calculation simplifies to evaluating whether both the columns of T : , : and U-j : are non-zero, 

/\S(T)\ \ /|S(C7)| \ 

( \J tki : I A f \J Ukj-. 1 . (13) 

The resulting vector of length \0\ has a 1 at positions that refer to objects which are part of the result 
and zeroes elsewhere. 

The next case is to uniquely select pairs of one bound and one free variable, as for instance in SELECT 
DISTINCT ?a ?b WHERE {?a T:p, ?b} . {?c I/:pj ?b} . To evaluate this query, we first calculate the 
IS'(T)! ^(U)! -by- \0\ Khatri-Rao product as discussed in Section 1X11 Note that we treat the rows of the 
resulting matrix as ^(T)! blocks, each of 15(17)1 columns. The next step then is to get a matrix of size 
|5'(T')|-by- |0| that has a 1 at position ( m,n ) if in the m-th block of the Khatri-Rao product the n-th 
column has at least one non-zero (with to G {1,..., |5(T)|} and n G {1,..., |0|}). More straightforward, 
the same answer is obtained by taking those columns of T : ,; : where the corresponding columns of U : j : 
have at least one non-zero, 

T:i: O \ \/ j ' ( 14 ) 

The final case we examine is to uniquely select pairs of the unbound variables, as in the query 
SELECT DISTINCT ?a ?c WHERE {?a T: p 8 ?b} . (?c U: p, ?b} , where i € (1,..., |P(T)|} and j G 
{1,..., |P(Z7)|}. We show how the naive way to first compute the Khatri-Rao product followed by 
OR-ing along the columns corresponds to the Boolean matrix product, 

T.J.. O U ;j :. (15) 

The query asks for all unique ?a ?c pairs that match on ?b. As described in Section im to get all ?a ?c 
pairs that match on ?b we compute the Khatri-Rao product of the respective slices of 'T and ’ll. To 
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retrieve only unique results in terms of subject-subject combinations, we compute OR along the columns 
of the Khatri-Rao product (corresponding to the objects). This is 


\o\ IOI 

(T ,2, © = (Ul © ^:jl j t:i2 © ^:j'2 j ■ • • j U|0| © j |O| ) 

fc=l fc=l 


k—1 tlik'U’ljk ^ 

Vfc=l tlik^2jk 


\/|0| + 

V fc=l +lik'U'\s\jk 

Vfc=l t^ikV'ljk 
VlJl ^2ik^‘2jk 


— ( T-i ; O U : j;)^ 


vv 


IOI 

fc=l 


i|S|ifcM|S|jfc/ 


(16) 


(17) 


the vectorization of the Boolean matrix product T : ,; : o U : j- along the columns. This matches the well- 
known fact that join-distinct in standard relational databases can be computed using the Boolean matrix 
product. In particular, we can as well apply the techniques for fast computation of these joins to this 
case (cf. Section PT2l) . 

One conceptual difference between first computing the initial solution sequence from the WHERE clause 
and then applying DISTINCT over computing the result in one step is that the specified execution order 
cannot be obeyed. SPARQL defines that ORDER BY statements must be applied on the initial solution 
sequence before the projection. This means, after processing everything from the WHERE clause but 
before anything from the SELECT statement. However, by using binary tensors instead of graphs to 
represent the RDF data, and in particular by using lists of labels instead of sets, we already introduce 
an ordering even before processing the WHERE clause. If this ordering obeys the ORDER BY statement, we 
can also represent the output in the appropriate order without the intermediate step. 

Note that SPARQL also defines the keyword REDUCED that modifies the SELECT clause such that it 
outputs at most all results that SELECT without modifiers would yield and at least those results that 
SELECT DISTINCT yields. Hence, there is no new operation to define for that modifier, as we already 
discussed two options to provide a valid answer to a SELECT REDUCED query. 


4.4 Matching Alternatives — Union 

The keyword UNION may be placed between two graph patterns in a WHERE clause. The result of a union 
query is a concatenation of the solution sequences of the graph patterns. We can interpret this as an 
OR between the results of both graph patterns. Note however that no matching takes place in this type 
of query. The initial solution sequence comprises all results from both graph patterns, Each result is 
equipped with all the variables defined that occur in either of the graph patterns, possibly left blank. 

Consider SELECT ?b WHERE {?a T:pi ?b} UNION {?c U:pj ?b}. The initial solution sequence com¬ 
prises all ?a-?b pairs with an additional blank ?c, as well as all ?c-?b pairs with an additional blank ?a. 
This is the same as concatenating U-.i : and T.j. along the columns. The final result would be all objects 
from T with predicate p; and all objects from U with predicate pj. In our terminology, it would be a 
vector of length |0(T)| + \0(U)\ which has a 1 at positions that either refer to an object of T or U that 
occurs in the result. Similarly, in a query like SELECT ?a ?c WHERE {?a T:p^ ?b} UNION {?c U:pj 
?d} U-i- and T : j. would be concatenated along the diagonal, as no variable is bound. The final solution 
sequence would comprise all subjects from T and all subjects from U , bound to separate variable names, 
each paired with a blank. 


4.5 Further Operations 

While join and union operations directly act on the data structure, SPARQL also defines other types of 
operations that act on the labels of the data. For instance, it is possible to order the solution sequence 
using a comparator on the labels of one mode. Also, one can use filters to restrict the solutions to those 
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for which the filter expression yields true. Filters can for example be comparators to restrict numeric 
values, or regular expressions to restrict the values of strings. 

The ASK statement can be used in place of SELECT. The result of such a query is true if the solution 
sequence obtained from the WHERE part is non-empty. Hence, if we calculate the result on Boolean tensors, 
we emit TRUE as soon as the first non-zero appears in the result. 

SPARQL provides the keyword CONSTRUCT that allows generate a new rdf graph from the result of a 
query. As any rdf graph can be expressed as a Boolean tensor, an rdf tensor can as well be constructed 
by such a query. 

5 Cardinality and Computation of Joins 

An important problem in query processing is to estimate the cardinality of join operations. This problem 
has attracted a significant amount of research in traditional relational databases. For SPARQL joins, 
Neumann and Moerkotte m proposed so called characteristic sets to estimate the cardinalities of SPARQL 
joins (specifically, the type of joins they called star joins). In this section we study how the size of join 
operations can be computed (or estimated) given our framework. 

5.1 Khatri—Rao Products 

Let us first assume we have stored the marginal sums along each mode of the |Sj -by- |P| -by- \0\ data 
tensor T . That is, we have three matrices, P (|P|-by-10|), Q (|Sj-by-10|), and R (|Sj -by- |P|), for 
the column, row, and tube marginal sums, respectively. The element (i. j) of Q, for example, would be 
computed as Ujk- 

The number of triples returned by a join, with no projection or DISTINCT keyword, is the number of 
non-zeroes in the result. When a join can be expressed as a Khatri-Rao product between two matrices 
A and B (as is the case with most joins considered in Sections 14.11 and 14.21) . the number of non-zeroes 
in A&B can be determined exactly using the column marginal sums of A and B. Specifically, if A and 
B both have n columns, let er A = (cr A )” =1 and cr A = (er B )” =1 be row vectors that contain the column 
marginals of A and B , respectively (e.g. a A = a ji )■ 

Proposition 5.1. Let cr A and cr B be as above. The number of non-zeroes in A 0 B is 

n 

\A.Q B\ = ^ a A a B = ct a ((t b ) t . (18) 

2—1 

The column marginal vectors er are naturally just appropriate rows or columns of the tensor marginal 
sum matrices P, Q, or R. If they are stored in a sparse format, the size of the join can be computed 
exactly in time @(a + /3), where a and /3 are the number of non-empty columns of A and B , respectively. 

We can also obtain an upper bound for the size of the join in constant time if in addition we store 
the / 2 -norms for each row and column of the marginal sum matrices (that is, ||<r|| for every possible er): 
Proposition 5.2. Let cr A and cr B be as above. Then 

\AQB\ < ||<x A || [|ct b || . (19) 

Proof. Noticing that cr A (cr B ) T = ||rr A || ||er B || cos (9 together with (fTSl) gives the result as cosd <1. □ 

As of now, writing the join as a Khatri-Rao product does not seem to bring significant benefits on 
computing the full join, though. The best worst-case bound is 0(|A| |£?|) from the straight-forward 
evaluation of the algorithm. 

As mentioned in Section POl those SELECT DISTINCT queries that choose the bound variable can be 
computed by selecting the non-zero columns of the Khatri-Rao product. Consequently, the cardinality 
of the result is <t a (<t b ) t (taking vectors cr as row vectors) and the whole query (not just the cardinality) 
can be computed in time ©(a + /3). 

All of the above computations require an access to the marginal sums. We argue that storing (and 
updating) this information is feasible for the data base system. In principle, the matrices can be very 
large, but notice that every (s,p, o) triple of the data has effect to exactly one element in each of the 
three matrices. Hence, the total number of non-zeroes in the marginal matrices cannot exceed 3 |7”|, and 
in practice it would be expected to stay much smaller. 
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5.2 Boolean Products 


When the SELECT DISTINCT query asks for a pair of variables, the evaluation does not (have to) involve 
the Khatri-Rao product, but rather the Boolean matrix product (Section [4T3]) . Here, estimating the size 
of the result becomes harder. Take, for example (HU): computing the result involves taking an OR over 
the columns of a Khatri-Rao product, and hence, even if we know the cardinalities of each column, we 
can only give very coarse estimations on the final cardinality: 

Proposition 5.3. Let A and B be two binary matrices with n columns and let u A and cr B be their 
column marginal sum vectors. Then, 


n r A R t 

max{ a, a i } < 
2=1 


V(A0B) :i 


i—1 


\ ' A B 

a, a, 


< , 


i—1 


( 20 ) 


As a Khatri-Rao product, AQ B, contains a specific structure that could help estimating the cardi¬ 
nality better. Another approach is to estimate the cardinality from the Boolean product formulation (1171) . 
We will first sketch some estimates on the expectation that are simple and fast to compute, and could 
therefore be of interest to practitioners. 

Proposition 5.4. Let A and B be m-by-A: and k-by-n binary matrices whose non-zeroes are uniformly 
distributed, and let p A and p B be the densities of A and B, respectively. Then 

n\A°B\] = 1- {l-p A p B ) k ■ (21) 


We can improve our estimation of the expectation if we notice that the Boolean product is equivalent 
to element-wise ORs of k rank-1 binary matrices. 

Proposition 5.5. Let A and B be m-by-fc and fc-by-n binary matrices and let p A = (p A )^—i and p B = 
(p B )i—i be the column densities of A and row densities of £?, respectively. Assuming the non-zeroes in 
the rank-1 matrices are distributed uniformly at random for all i = 1,..., k, we have that 


k 

E[\AoB\} = 1-Hp a p b . 

i=1 


( 22 ) 


Notice that if we take the union bound of the densities of the rank-1 matrices to obtain the upper 
bound for the number of non-zeroes in A o B, we obtain the same result as in Proposition [531 

The above methods, while being straight forward, require only the marginal sums, which makes them 
relatively fast to compute. If we can access the whole matrices, we can do much better estimations. In 
particular, we can use the result of Amossen et al. [lj: 

Proposition 5.6 (HI)- There exists an algorithm that obtains an 1 + e approximation of |Ao J3| in 
expected time 0(|A| + |iJ|) for any £ > 4/ $/\A\ + |£?|. 

Computing the Boolean product can also be done faster than the standard matrix product: for 
example, Amossen and Pagh [2] proposed an algorithm that runs in time 0(s 2 / 3 z 2 / 3 + s °- 862 2 : 0 - 408 ) ; 
where s = \ A \ + |£?| is the number of non-zeroes in the input, z = | A o B\ is the number of non-zeroes in 
the output, and O(-) hides the polylogarithmic factors. For sparse input matrices, the bound is generally 
very good, but if the input matrices are quite dense (and the output moderately so), the overall time 
complexity exceeds that of the standard matrix multiplication, i.e. O(mP) for m-by-m matrices. To 
address this, Lingas [TS] proposed a randomized algorithm that runs in time 0(m 2 s tJ / 2_1 ), where the 
factor matrices are m-by-m and s is as above. (See ITS] for an extension to rectangular matrices.) 


6 Tensor Decompositions 

Similarly to matrices, decomposition is a natural operation to tensors. A tensor decomposition reveals 
regularities of the decomposed tensor, and these regularities can sometimes speed up the computations. 
Furthermore, finding the regularities is a natural data analysis task. In this section, we will first give 
the definition of (Boolean) tensor CP decomposition and (Boolean) tensor rank. We will then study 
the properties of the decompositions, especially the sparsity. Unlike for the normal decomposition, with 
Boolean decompositions, we can prove upper bounds on the density of the factor matrices, showing that 
storing the data in a decomposed format is a valid option for saving space. 
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6.1 Definitions 

The so-called Boolean tensor CP decomposition | is defined as follows: 

Definition 1 ( |19|1. Given an n-by-m-by-l binary tensor T~ and an integer r, its rank-k Boolean CP 
decomposition consists of three binary matrices A (n-by-r), B (m-by-r), and C ( l-by-r ) such that 

r 

T = \J a :i M b-i IE c :i . (23) 

2—1 

The Boolean CP decomposition is closely connected to the concept of Boolean tensor rank. 

Definition 2. A 3-way binary tensor T” is rank-1 tensor if T is an outer product of three binary vectors, 
that is 

T = a IS! b IE c . (24) 

The Boolean rank of a 3-way binary tensor 7~, denoted ranks (T"), is the least integer r such that there 
exist r rank-1 binary tensors with 

r 

T = \J a,i E b-i E c : i . (25) 

2—1 

Notice that the ith columns of the factor matrices A , B , and C of the CP decomposition define 
a rank-1 tensor a-i IS] b-i S3 c :i . In other words, the CP decomposition expresses the given tensor as a 
Boolean sum of r rank-1 tensors. 

The Boolean CP decomposition can also be expressed in terms of Boolean and Khatri-Rao matrix 
products using matricization: three binary matrices A, B, and C form a Boolean CP decomposition of 
7” if and only if [12] 


T {1) = Ao(CoB) t , (26) 

T (2) = B o (C © A) t , (27) 

and 

T {3) =Co(BqA) t . (28) 

Intuitively, then, we can think of the operation that turns the three factor matrices into a tensor as 
an operation that takes a join of two slices followed by distinct join of the result and another slice. 

6.2 Properties of the CP Decomposition 

Unfortunately, computing the rank or CP decomposition of a tensor is NP-hard, both under the nor¬ 
mal m and Boolean m algebras. The rank is also not bounded by the smallest dimension, unlike with 
matrices. That is, there exist n-by-m-by-Z binary tensors 7~ such that ranks (T") > min {n,m,l} [19] . 
However, we have an upper bound m , 

rankB(T’) < min {nm,nl,ml} . (29) 

Given the above results, we cannot hope for an efficient algorithm finding the smallest CP decompo¬ 
sition of data. Yet, we can always find some CP decomposition. The most naive way is to unfold the 
data along the longest mode (so that the unfolded matrix has min {nm,nl,ml} columns), set this as one 
factor matrix, and then construct the other two factor matrices in such a way that their Khatri-Rao 
product is the identity matrix (how that is done is explained in [19] 4. Henceforth we will assume that 
our data tensor is represented in the factorized format. 

A common motivation for storing data in factorized formats is that they can save space compared to 
storing the full data, essentially by a more efficient representation of regularities in the data. Yet, there 
usually are no studies on how much space the decomposition could save, or take. Boolean decompositions, 
however, do allow us to bound the density (or sparsity) of the factors with respect to the density of the 
data. First we repeat the result on the absolute number of non-zeroes in the factor matrices, from m ■ 

3 The name is short for two names given to the same decomposition: CANDECOMP \S\ and PARAFAC na¬ 


il 



Proposition 6.1. Let 7” be binary a tensor that has ranks (7~) = r. Then 7~ has a rank-r Boolean CP 
decomposition to A, B , and C such that 


|A| + |£?| + |C| < 3 |T| . 


(30) 


Inequality (IHOl) is tight (consider the case of |7~| = 1), but it might paint a slightly too pessimistic 
picture of the actual sparsity. Rather, we would like to follow m and relate the sparsity of the factors 
to the sparsity of the data. If T is an n-by-m-byd binary tensor, we define its sparsity s(T) as 


s(T) = 1 - 


JZl. 

nml 


(31) 


(The same notation is extended to matrices and vectors with one (respectively two) modes having dimen¬ 
sion 1.) Further, the rank-r Boolean CP decomposition (A,B,C) is reducible if there exists an index 
j £ {1,..., r} such that 

\J a-i [3 b i lc : j= \j a : i IE b,, Kl c-i . (32) 

If the decomposition is not reducible, it is said to be irreducible. 

Proposition 6.2. Let T be binary tensor that has a rank-r irreducible (but not necessarily minimal) 
Boolean CP decomposition to factors A , B , and C. Then 


s(A) + s(B)+s(C)>s(T) . (33) 

Our proof uses a similar technique as m- We will first proof the following special case. 

Lemma 1. Proposition^^ holds when r = 1. 

Proof. Tensor T must be rank-1. Hence, \T\ = \A\\B\\C\, or equivalently, 1 — s(T’) = (1 — s(A))(l — 
s(J3))(l — s(C)). It follows that 

s(T) = s(A) + s(B) + s(C) - s(A)s(B) - s(A)s(C) - s(B)s{C) + s{A)s(B)s(C) 

< s(A) + s(B) + s(C) , 

which holds as s(-) € [0,1]. □ 

Proof of Proposition 1 6. 31 For this proof we need the concept of a residual tensor TZk- Let IZi = T , and 
for k = 2,..., r, let TZk = 72-fe-i A -i (a : k Kl b k ^ c-.k), where the AND and NOT are element-wise. Notice 
that, as the decomposition is irreducible, sffJZk) > s(TZk-i). 

We now have for k = 1,..., r 


s(a :k ) + s(b :k ) + s(c :k ) > s(a :k El b :k IE c :k ) > s(lZ k ) > s(T) 

The first inequality follows from Lemma (T) and the next ones from the fact that the Boolean semi-ring 
is anti-negative. The statement follows, as 

1 r 

s(A) = -^s(a :fc ), 

' fc=l 


and similarly for s(B) and s(C). □ 

The above discussion strongly indicates that storing the data tensor in the factorized form can yield 
significant space savings, and in the worst case, should not increase the storage requirements too much. 
It seems reasonable to assume that the factorisation would also give benefits on the computations: after 
all, low-rank tensors should have more regular structure than high-rank tensors, and this regularity 
should help with the computations. Unfortunately, there does not seem to be much work towards that 
end. Bader and Kolda [6] study some operations on CP-decomposed tensors (under normal algebra), 
and while their results generally carry over to the Boolean algebra, the operations they study are not 
commonly seen in our framework. 
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7 Other Related Work 


Tensors, as a way to represent multi-way relations, have been studied in the context of databases for a 
long time, although the term tensor is not commonly used. Instead, terms like Data Cube m are used 
to refer to the data (and the associated operations, and the framework). 

It is also not new to use the binary tensor representation of RDF data to improve the processing of 
SPARQL queries. For example, Atre et al. [1] propose a technique to effectively process join queries using 
binary tensors (referred to as Bit-cubes). Subsequent work also extends the method to left outer joins [3]. 

Tensor decompositions have been applied to RDF data earlier. For example, Drumond et al. [9] and 
Nickel et al. ESI use the decompositions to predict missing or unobserved RDF triples, while Erdos et 
al. [10] use so-called Boolean Tucker3 decomposition to discover facts from “surface” ( s,p,o ) triples. 

For relational algebra, the tensor relational model JT5] is a framework that supports both relational 
algebraic operations for data manipulation and tensor algebraic operations for data analysis. Kim and 
Candan propose efficient ways to combine the costly tensor decomposition needed for data analysis 
together with join m3, normalization m , and union operations m for data manipulation. For the 
join for instance, the authors use non-negative tensor decompositions on the components to be joined in 
order to approximate the decomposition after the operation. 

Bakibayev et al. [7] propose factorised relational databases that use compact factorized representations 
of data to improve query performance and to reduce redundancy. The authors propose a specialized query 
engine to handle select-project-join queries on such data efficiently. 


8 Future Work 

In this paper we have presented a framework for SPARQL queries as Boolean tensor operations. We 
believe that our framework can help with the analysis the SPARQL queries, and with the development of 
new optimizations. 

The Boolean tensor rank measures the complexity of the tensor: the higher the rank, the less regu¬ 
larities the tensor has. It seems viable to use the tensor rank as a parameter of the complexity, much 
the same way as, say, the treewidth is used. Yet, we are unaware of any research towards that direction. 

Most SPARQL operations (especially the joins) can be expressed using the Khatri-Rao product. Unlike 
the normal matrix product, that has enjoyed on significant research interest over the years, the Khatri 
Rao product is relatively unknown and unstudied. As it is the key for evaluating joins, insights on the 
computation of it can lead to direct real-world benefits. 

One can also ask the question the other way around, though. As the Khatri-Rao product (and the 
Boolean matrix product) can be expressed as SPARQL queries, could this be used as a way to implement 
more efficient algorithms for (Boolean) tensor analysis. A relatively simple goal could be to use the data 
structures and indexing approaches employed by RDF databases - such as rdf-3x m - for more efficient 
data analysis algorithms. 

Going further, one can also consider implementing the whole data analysis algorithms on top of 
rdf databases. Again, we argue that our tensor representation should help, giving a framework where 
many data analysis problems map easily. It is clear, though, that SPARQL should be extended with new 
operations, should one want to implement more complicated data analysis directly on it (much the same 
ways as standard data mining methods can be integrated to relational databases; see, e.g. 2(71124] 1. 

9 Conclusions 

We have presented a framework for rdf and SPARQL based on Boolean tensors. The framework allows 
us to cast SPARQL queries as different types of matrix products, and use this formalisation to gain new 
insights and apply previously-defined techniques for their processing. Particularly, we showed how to 
count (and estimate) the cardinality of various SPARQL join operations. We also briefly considered the 
Boolean CP decomposition as an option to find regularities from the data, and use these regularities for 
improved query processing and storage. Whether the decompositions would help on query processing 
is still wide open, but we did show two upper bounds for the space requirements of the decomposition. 
Overall, we see this work only as the first step, as we believe that the framework we presented here 
facilitates better analysis of RDF and SPARQL. 


13 


References 


[1] R. Amossen, A. Campagna, and R. Pagh. Better size estimation for sparse matrix products. Algo¬ 
rithmic,a, 69(3):741 757, 2014. 

[2] R. R. Amossen and R. Pagh. Faster join-projects and sparse matrix multiplications. In ICDT ’09, 
pages 121-126, Mar. 2009. 

[3] M. Atre. A technique for SPARQL OPTIONAL (left-outer-join) queries. CoRR, abs/1304.7, 2013. 

[4] M. Atre, V. Chaoji, M. J. Zaki, and J. A. Hendler. Matrix ’’Bit” Loaded: A Scalable Lightweight 
Join Query Processor for RDF Data. In WWW ’10, pages 41—50, 2010. 

[5] B. Bader and T. Kolda. Algorithm 862: MATLAB tensor classes for fast algorithm prototyping. 
ACM Transactions on Mathematical Software, 32(4):635-653, 2006. 

[6] B. Bader and T. Kolda. Efficient MATLAB computations with sparse and factored tensors. SIAM 
Journal on Scientific Computing, 30(1):205-231, 2007. 

[7] N. Bakibayev, D. Olteanu, and J. Zavodny. Fdb: A query engine for factorised relational databases. 
Proceedings of the VLDB Endowment, 5(11):1232- 1243, 2012. 

[8] J. D. Carroll and J.-J. Chang. Analysis of individual differences in multidimensional scaling via an 
N-way generalization of “Eckart-Young” decomposition. Psychometrika, 35(3):283—319, 1970. 

[9] L. Drumond, S. Rendle, and L. Schmidt-Thieme. Predicting RDF triples in incomplete knowledge 
bases with tensor factorization. SAC ’12, page 326, 2012. 

[10] D. Erdos and P. Miettinen. Discovering Facts with Boolean Tensor Tucker Decomposition. In CIKM 
’13, pages 1569-1572, 2013. 

[11] N. Gillis and F. Glineur. Using underapproximations for sparse nonnegative matrix factorization. 
Pattern Recogn., 43(4):1676-1687, 2010. 

[12] J. Gray, S. Chaudhuri, and A. Bosworth. Data cube: A relational aggregation operator generalizing 
group-by, cross-tab, and sub-totals. In ICDE, pages 152-159, 1996. 

[13] R. A. Harshman. Foundations of the PARAFAC procedure: Models and conditions for an ’’ex¬ 
planatory” multimodal factor analysis. Technical Report 16, UCLA Working Papers in Phonetics, 
1970. 

[14] J. Hastad. Tensor rank is NP-complete. J. Algorithm, ll(4):644-654, Dec. 1990. 

[15] M. Kim and K. Candan. Approximate tensor decomposition within a tensor-relational algebraic 
framework. CIKM ’ll, pages 1737 1742, 2011. 

[16] M. Kim and K. Candan. Decomposition-by-normalization (DBN): leveraging approximate functional 
dependencies for efficient tensor decomposition. CIKM T2, pages 355-364, 2012. 

[17] M. Kim and K. Candan. Pushing-down tensor decompositions over unions to promote reuse of 
materialized decompositions. ECML PKDD ’14, pages 688-704, 2014. 

[18] A. Lingas. A fast output-sensitive algorithm for boolean matrix multiplication. Algorithmica, pages 
408-419, 2011. 

[19] P. Miettinen. Boolean Tensor Factorizations. In ICDM ’ll, pages 447-456, 2011. 

[20] A. Netz, S. Chaudhuri, U. Fayyad, and J. Bernhardt. Integrating data mining with SQL databases: 
OLE DB for data mining. In ICDE ’01, pages 379-387, 2001. 

[21] T. Neumann and G. Moerkotte. Characteristic sets: Accurate cardinality estimation for RDF queries 
with multiple joins. In ICDE ’ll, pages 984-994, 2011. 


14 


[22] T. Neumann and G. Weikum. The RDF-3X engine for scalable management of RDF data. The 
VLDB Journal , 19(1):91 113, Sept. 2009. 

[23] M. Nickel, V. Tresp, and H.-P. Kriegel. Factorizing YAGO: Scalable Machine Learning for Linked 
Data. In WWW ’12, pages 271 280, 2012. 

[24] S. Sarawagi, S. Thomas, and R. Agrawal. Integrating Association Rule Mining with Relational 
Database Systems: Alternatives and Implications. Data Min. Knowl. Discov., 4(2-3) :89—125, July 
2000 . 


15 


